Classifying HIV risk tweets using tweets from San Diego county
نویسنده
چکیده
This project aims to characterize and classify tweets that show users exposing HIV risk behaviour through their tweets on the social networking site Twitter. A labeled dataset obtained from doctors in UCSD’s Anti Viral Research Center (AVRC) was used as the dataset. To get a better understanding of the data collected and to build a good classification model, a series of exploratory data analysis (EDA) experiments were performed on the training dataset. The EDA phase of the project revealed information on relevant and irrelevant features as expected. Then a comparative study is performed on classification using models built using logistic regression and Support Vector Machines. We find that logistic regression trumped over Support Vector Machines when the the domain specific terms collected from domain experts were made part of the feature set.
منابع مشابه
Identifying Data Noises, User Biases, and System Errors in Geo-tagged Twitter Messages (Tweets)
Many social media researchers and data scientists collected geotagged tweets to conduct spatial analysis or identify spatiotemporal patterns of filtered messages for specific topics or events. This paper provides a systematic view to illustrate the characteristics (data noises, user biases, and system errors) of geo-tagged tweets from the Twitter Streaming API. First, we found that a small perc...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملHIV Risk on Twitter: the Ethical Dimension of Social Media Evidence-based Prevention for Vulnerable Populations
As of 2016 the HIV/AIDS epidemics is still a key public health problem. Recent reports showed that alarmingly high numbers of people in vulnerable populations are not reached by preventative efforts. Despite technology improvement, we are not yet able to identify populations that are most susceptible to HIV infections. In order to enable evidencebased prevention, we are studying new methods to ...
متن کاملHow Do You #relax When You’re #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets
BACKGROUND Stress is a contributing factor to many major health problems in the United States, such as heart disease, depression, and autoimmune diseases. Relaxation is often recommended in mental health treatment as a frontline strategy to reduce stress, thereby improving health conditions. Twitter is a microblog platform that allows users to post their own personal messages (tweets), includin...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015